N-gram Language Modeling of Japanese Using Prosodic Boundaries
نویسندگان
چکیده
A new method was developed to include prosodic boundary information into statistical language modeling. This method is based on counting word transitions separately for the cases crossing accent phrase boundaries and not crossing them. Since direct calculation of the above two types of word transitions requires a large speech corpus which is practically impossible to make, bi-gram counts of part-of-speech (POS) transitions were first calculated for a small speech corpus separately for the two cases instead. Then, word bi-gram counts calculated for a largescale text corpus were divided into the two cases according to the POS transition feature, and finally, two types of word bigram models, one crossing accent phrase boundaries and the other not, were obtained. The method was evaluated through perplexity reduction by the proposed models from the baseline models. When correct boundary position was used, the reduction reached 11%, and when boundaries were extracted using our formerly developed method based on mora-F0 transition modeling, it was 8%. The reduction around 6% was still observed for speech uttered by a speaker different from the one for the corpus used to calculate the POS bi-gram counts.
منابع مشابه
Continuous Speech Recognition of Japanese Using Prosodic Word Boundaries Detected by Mora Transition Modeling of Fundamental Frequency Contours
An HMM-based method of detecting prosodic word boundaries was developed for Japanese continuous speech and was successfully integrated into a mora-basis continuous speech recognition system with two stages operating without and with prosodic information. The method is based on modeling the fundamental frequency (F0) contour of input speech as transitions of mora-unit F0 contours and operates af...
متن کاملN-gram language modeling of Japanese using bunsetsu boundaries
A new scheme of N-gram language modeling was proposed for Japanese, where word N-grams were calculated separately for the two cases: crossing and not crossing bunsetsu boundaries. Here, bunsetsu is a basic grammatical (and pronunciation) unit of Japanese. A similar scheme using accent phrase boundaries instead of bunsetsu boundaries has already been proposed by the authors with a certain succes...
متن کاملThe role of prosodic boundaries in word discovery: Evidence from a computational model.
This study aims to quantify the role of prosodic boundaries in early language acquisition using a computational modeling approach. A spoken term discovery system that models early word learning was used with and without a prosodic component on speech corpora of English, Spanish, and Japanese. The results showed that prosodic information induces a consistent improvement both in the alignment of ...
متن کاملEffects of prosodic boundaries on ambiguous syntactic clause boundaries in Japanese
We report the results of experiments designed to investigate the effects of prosodic boundaries on resolving ambiguous syntactic clause boundaries in Japanese. The head-final, prodrop nature of this language generates abundant syntactic attachment ambiguity for sentences that contain relative clauses. Two types of sentences with differing head nouns modified by relative clauses were examined. S...
متن کاملAccent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling
This paper proposes an automatic prosodic labeling technique for constructing speech database used for speech synthesis. In the corpus-based Japanese speech synthesis, it is essential to use annotated speech data with prosodic information such as phrase boundaries and accent types. However, manual annotation is generally time-consuming and expensive. To overcome this problem, we propose an esti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002